Effective Skyline Cardinality Estimation on Data Streams
نویسندگان
چکیده
In order to incorporate the skyline operator into the data stream engine, we need to address the problem of skyline cardinality estimation, which is very important for extending the query optimizer’s cost model to accommodate skyline queries. In this paper, we propose robust approaches for estimating the skyline cardinality over sliding windows in the stream environment. We first design an approach to estimate the skyline cardinality over uniformly distributed data, and then extend the approach to support arbitrarily distributed data. Our approaches allow arbitrary data distribution, hence can be applied to extend the optimizer’s cost model. To estimate the skyline cardinality in online manner, the live elements in the sliding window are sketched using Spectral Bloom Filters which can efficiently and effectively capture the information which is essential for estimating the skyline cardinality over sliding windows. Extensive experimental study demonstrates that our approaches significantly outperform previous approaches.
منابع مشابه
On Estimating the Maximum Domination Value and the Skyline Cardinality of Multi-Dimensional Data Sets
The last years there is an increasing interest for query processing techniques that take into consideration the dominance relationship between items to select the most promising ones, based on user preferences. Skyline and top-k dominating queries are examples of such techniques. A skyline query computes the items that are not dominated, whereas a top-k dominating query returns the k items with...
متن کاملSkyline Operator on Anti-correlated Distributions
Finding the skyline in a multi-dimensional space is relevant to a wide range of applications. The skyline operator over a set of d-dimensional points selects the points that are not dominated by any other point on all dimensions. Therefore, it provides a minimal set of candidates for the users to make their personal trade-off among all optimal solutions. The existing algorithms establish both t...
متن کاملAn Effective Probabilistic Skyline Query Process on Uncertain Data Streams
With the evolution of technology, the ways to acquire data and the applications of data are more diverse. As data volume continuously grows, the data quality may not be high as usual. The data can be defected, imprecise or inaccurate due to the process of data acquiring. Recently, the skyline query is widely used in data analysis to derive the results that meets more than one specific condition...
متن کاملAn Algorithm for Retrieving Skyline Points based on User Specified Constraints using the Skyline Ordering
Given a multidimensional data set, a skyline query returns the interesting points that are not dominated by other points. The actual cardinality (s) of a skyline query result may vary substantially from the desired result cardinality (k). An approach called skyline ordering is used that forms a skyline based partitioning of a given data set, it provides an ordering among the partitions. The con...
متن کاملSkyline Ordering: A Flexible Framework for Efficient Resolution of Size Constraints on Skyline Queries
Given a set of multi-dimensional points, a skyline query returns the interesting points that are not dominated by other points. It has been observed that the actual cardinality (s) of a skyline query result may differ substantially from the desired result cardinality (k), which has prompted studies on how to reduce s for the case where k < s. This paper goes further by addressing the general ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008